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SUMMARY OF BULLETIN No. 119. 
Three conceptions of type should rest in the breeder's mind : 

1. The ideal, or standard for selection; attained by few individ- 
uals, perhaps by none. Page 2 

2. The mode, or prevailing type as represented by the highest 
proportion of what the breeder actually produces. Page 6 

3. The mean, or average of all the breeder produces. Page 7 

Variability is deviation from type. It is best indicated by the 
standard deviation, a mathematical expression involving the devi- 
ation of every individual. Page 11 

Variability may be reckoned from the mean, the mode, the selec- 
tion standard, or any other desired basis. Page 12 

The coefficient of variability is a purely abstract expression for 
variability, so that by its means the variability of one character may 
be compared with that of another either in the same or different 
races. Page 12 

The effect of selection is to shift the type without greatly reduc- 
ing variability. Page 17 

Each character of every race has a variability that is natural, 
and this variability cannot be greatly reduced by selection. Page 19 

The indirect effect of selection is to influence physical or other 
characters correlated with those selected. Page 20 

The type of ear is directly affected by fertility, so far as length, 
circumference and weight are concerned, but not as to number of 
rows. Page 21 

Variability is slightly less on fertile land than on lands giving 
lower yields. Page 24 

The breeder of the future will be a statistician and a book- 
keeper. 



TYPE AND VARIABILITY IN CORN 

BY EUGENE DAVENPORT, PROFESSOR OK THREMMATOLOGY, 
AND HENRY L,. RIETZ, STATISTICIAN 

The purpose of the present bulletin is to outline and define a 
clearer conception of type and variability than commonly rests in 
the breeder's mind, and to present certain data showing conditions 
that influence type and variability in corn. 

TYPE AND VARIABILITY IN GENERAL 

The subject is treated by the statistical method, now every- 
where employed for the study of the more complicated questions in 
variation and heredity. 1 This method was first used by Galton in 
his study of stature of English people (See Natural Inheritance) 
and afterward elaborated by Pearson and others and applied to the 
study of heredity problems generally. No excuse is offered for em- 
ploying the method of treatment here, because it is the only proper 
one for these purposes and because the time has come when breeders 
generally are expected to be somewhat familiar with this method of 
study. The reader is therefore urged not to pass by this form of 
study because it may happen to be new and unfamiliar. 

The technical terms and conceptions, such as standard deviation 
and coefficient of variability, are no more difficult than are interest 
and percentage, and a little careful attention will enable the reader 
to become fully acquainted, not only with their meaning and the 
method of determination, but with the larger conceptions of hered- 
ity that come with their habitual use. 

WHAT is MEANT BY TYPK 

A farmer plants corn from an ear, say ten inches in length. 
What he gets is not a crop of ears all ten inches long, nor of any 
other even length, but rather a mass of ears ranging in length all the 
way from three or four inches up to perhaps eleven or twelve, 
and very unevenly distributed between the extremes. The same 

] For a more complete statement of this method of study of breeding problems the 
reader is referred to chapters X and XI of Pearson's "Grammar of Science," published by 
A. and C. Black, London, or Part III of Davenport's "Principles of Breeding," Ginn and 
Company, Boston. 
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principle would have held if the ear planted had been nine inches 
long instead of ten except that the distribution would have been 
different, lengths running in general slightly lower; that is to say 
the length of ear in the offspring is not the same as that of the par- 
ent but it constitutes a "distribution" extending both above and 
below that length. 

So far as known this principle of transmission holds true in all 
races and for all characters. Stated in more general terms, apply- 
ing to all breeding, we may say that the offspring a.y a whole is not 
the same as the immediate parents but it constitutes a distribution 
extending from near the lower to approximately the upper limits of 
the race. This suggests at once the idea of type and that deviation 
from type which ive call variability. 

What now is our conception of type? If ten inch ears will not 
bring ten inch ears but something else, and not only something else 
but a considerable variety of lengths; and if what we get extends 
both above and below the parent, then we arrive at once at a double 
conception as to type ; that is to say the type of the offspring is not 
the same as that of the parent. The type of the parent is very def- 
inite, representing an ideal ; but if the offspring is distributed both 
above and below that ideal, some better and some not so good, then 
a close analysis of the real character of that offspring becomes nec- 
essary in order to make any just comparison between the two or to 
arrive at any adequate conception of type in a mixed population, 
even in one arising from a selected ancestry. 

A concrete case will serve best to illustrate the principle involved. 
In the year 1906, some Learning corn was raised on good ground 
from seed ears of ten inches in length. A "random sample" 1 of 
this crop consisting of 327 ears gave the following distribution as 
to length : 

One ear was 3.0 inches long; one was 4.0 inches; two were 
5.0 inches; three were 5.5 inches; nine were 6.0 inches ; eight were 
6.5 inches; twelve were 7.0 inches; nineteen were 7.5 inches; 
thirty-two were 8.0 inches ; forty were 8.5 inches ; sixty-seven were 
9.0 inches; sixty-three were 9.5 inches; thirty-eight were 10 inches; 



..'By a random sample is meant a sufficient portion of the whole and taken so much at 
random as to fairly represent the entire crop, or total "population" as the technical phrase 
goes. Statistical problems, were first studied with reference to people and the term population 
was thus a natural one. As the studies have been extended to other fields, even of inani- 
mate nature, we still retain the old terms and "population" in this sense is applicable as 
well to animals as to men; to bricks or stems as to either. 
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twenty-one were 10.5 inches; eight were n.o inches; two were 
11.5 inches, and one was 12.0 inches long. 1 

Put in tabular form as it appears in actual work we have the 
following : 2 , 

LENGTH OF EARS NO. OF EARS OR 

OR VALUE -V FREQUENCY- f 

3.0 / I 

3.5_ _0 

4J5 Q 



5.0 // 


2 


5. 5 /// 3 


6.0 fHJ /III 


9 


6. 5 /////// 


8 


7.0 fHJ fHI II 


12 


7.5MLMMM 


19 



8. Q/W fHI tHI fHJ fHJ fHJ II __3 2 

8.5 W fHJ fHJ fHJ IHI fHJ fW fHJ _40 

V.tolHI mi Ml Wl M M IW-fW Ml fW IHI Ml IHJII 6 7 

9 . 5 /W fHJ fHJ fHJ fHJ fHJ fHJ fHJ fHJ fHJ fHJ fHJ III _63 

IQ.Q/W 'fHJ fW IHI fHJ fHJ fHJ III _3 8 

\*>fHJ fHI M tWI _2 I 

I I.O/W/// _8 



l2.Q/ 



Here we have a "frequency distribution" representing the entire 
"population" or crop, and as it lies spread out before the eye a 
glance is sufficient to afford considerable information as to the pre- 
vailing type. 

'Measurements might tc taken at quarter inches with a seeming higher degree of ac- 
curacy, but repeated trials show that the same final results follow whether measurements are 
taken at the quarter inch or at the half inch. The main point is that the numbers be suf- 
ficient and that the sample be representative. Judgment must dictate as to the accuracy of 
the s-amDle, but the number depends upon the degree of reliability desired. This matter will 
be fully discussed in the appendix under probable error, but experience shows that in 
studies with corn excellent results can be gotten with 300 to 400 ears, and very fair results 
may generally be had from half that number. 

2 Thts is tbc most convenient form in which to make the ^'"'"p' record. A mark is 
made for every individual examined, after which Uie additions are i^ao"rry^^. fcd 
totals constitute the "frequency distribution" and each group (as 12, 19 etc.,) is 
as a "class," and its measurement (as 7, 7-5, etc.,) is known as the class mark or value. 
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It will be noted at once that there are more ears 9.0 inches long 
than of any other length and that the distribution decreases in both 
directions from this "highest frequency." 

The Mode. This highest frequency or most common length 
is called the mode. It shows clearly what is the prevailing type as 
to length in the crop, as distinct from the selection type in the 
seed ear. This mode represents the value or measurement that is of 
the most common occurrence, and it is held by statisticans and by 
students generally to be the best obtainable single expression for 
type. When it is ascertained, therefore, we know at once what one 
might conceive as the natural type of the race or variety so far as 
the character in question is concerned. When this is determined for 
a number of important characters we shall have a good knowledge 
of racial type as a whole. Thus we might after the same manner 
obtain the mode for circumference, number of rows, weight of ear, 
color of grain, percent of corn to cob, or any other desired charac- 
ter. Having done so a typical ear of this variety could be definitely 
described. We thus arrive at an accurate idea of type and of its 
definite measurement as well. 1 

For the purpose of comparing the variability of races we use the 
"coefficient of variability" to be described later. The modal co- 
efficient is chiefly valuable for comparing one type with another 
within the race, which is all that is required in practical breeding. 

Practical Value of the Frequency Distribution, the Mode, and 
Modal Coefficient. The practical importance of the information 
afforded by these values must be apparent. By means of the 
frequency distribution the breeder is enabled at any time when he 
can secure sufficient numbers, to spread out before his eyes a good 
and fair representation of the whole population of the variety or 
race he is breeding, and in respect to any character which he can 
measure or accurately estimate. 



*The Empirical and the Theoretical Mode. It is evident by inspection of the frequency 
table that if measurements had been taken at the quarter inch, or some less fraction, the 
highest frequency would have fallen not at the nine inch point, but slightly above it, say at 
9.25 for example, for the next frequency above (63) is greater than the next one below (40); 
that is to say the mode is to some extent dependent upon the scheme of measurements 
adopted. A mode so determined is therefore only a close approximation to the actually most 
common length, and it is known as the empirical mode. If, however, the theoretical curve 
should be platted then all values would be accurately represented (see appendix) and the 
highest point in this curve would be the actual, or as it is called, the theoretical mode. In 
practical breeding operations the empirical mode arising from convenient measurements is 
sufficiently P/*- " aLC - ** ' ea ^s_Jxi no error becao a convenient scheme of measurements 
<,** A'mid is generally employed by all observers, so that empirical modes are comparable, 
ihus the scheme of half inch measurements is the one likely to be universally employed 
for corn. 
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When he has ascertained its mode he knows what is the natural 
type, for mode indicates type ; and he then knows by how much, if 
any, it differs from the type which he has chosen as the standard for 
selection. By this he may judge whether and to what extent he is 
operating at variance with nature. 

The Mean. There is still another conception of type as to this 
distribution, and that is the average or mean as it is technically 
called. It will be noted that the distribution does not decline uni- 
formly both above and below the mode; that is to say there are 
twelve values below and only six above, from which we conclude 
that the average length of ear is somewhat different from the most 
usual length. By multiplying each separate length by the number 
of ears oi that length and adding the products, (or, what is the 
same thing, adding together the lengths of all the ears) then divid- 
ing by the total number of ears we find the average or mean length 
to be 8.83 inches. 

Accordingly we have the following for the determination of the 
mean. 1 Multiply each value by its frequency, add the results and 
divide the sum by the number of individuals or variates. 

Applying this principle to the case in hand we have: 2 



V 




f 




fV 


3-o 


X 


I 


= 


3-o 


3-5 


X 





= 


o.o 


4.0 


X 


I 


= 


4.0 


4-5 


X 


o 


= 


0.0 


S.o 


X 


2 





10.0 


5-5 


X 


3 


= 


165 


6.0 


X 


9 


= 


54-0 


6-5 


X 


8 


= 


52.0 


7-0 


X 


12 


= 


840 


7-5 


X 


19 


= 


142.5 


8.0 


X 


32 





256.0 


8-5 


X 


40 


=z 


340.0 


9.0 


X 


67 


= 


603.0 


9-5 


X 


63 


= 


598.5 


10.0 


X 


38 


= 


380.0 


10.5 


X' 


21 


= 


220.5 


II. 


X 


8 





88.0 


"S 


X 


2 





23.0 


12.0 


X 


I 


= 


12.0 



327 2887.0 
2887.0 -=- 327 = 8.83 the mean length of ear in inches. 



1 By "mean" is Iiere meant the "arithmetical average" which is the average most com- 
monly accepted. 

z ln this table "V" stands for "values" or "magnitudes" in this case length and "f" 
stands for frequency, or the number of varieties (ears) of each separate class. The head- 
ing "f V" means the products of the values (lengths) multiplied by the corresponding fre- 
quencies. 
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Here we have a third valuation for type (8.83) representing the 
average as distinct from 9.0 of the highest frequency representing 
the most usual length, and both distinct from the 10 inches of the 
ear planted. 

Practical Use of the Mean. The mean gives a good average 
value of the character, and establishes the practical or commercial 
value of a race or variety, for it shows what it will do on the aver- 
age. It is not always, however, a good index of the prevailing type, 
for as often happens, the variety with the higher mean may have 
the lower mode. Neither is the mean always a good index of con- 
ditions ; for example, in a population of one thousand paupers and 
one millionaire, the mean wealth is fair, but the type is clearly that 
of the pauper. 

Here then are three separate and very definite conceptions of 
type, all of which have distinct applications to the practical affairs 
of breeding: i. The ideal, which is used in selecting the parentage. 
2. The prevailing type of the offspring as represented by the highest 
frequency (the mode). 3. The average of the offspring as repre- 
sented by the mean. These distinctions apply not only to length 
of ears in corn, but to all characters and all races ; that is, to breed- 
ing in general. 

The breeder of pedigreed stock is interested primarily in the 
ideal and in the mode or highest frequency, while the general far- 
mer who multiplies or raises it for the open market is most inter- 
ested in the mean or average production. 1 

VARIABILITY, OR DEVIATION FROM TYPE 

Having established definite distinctions as to type the student of 
breeding problems should form equally clear conceptions as to 
deviation from type, commonly known as variability. 2 . 

'It is lo be noted (bat the generation to which the selected parent belonged had also its 
own mode and mean which may have been quite different from those of the offspring. 

2 The ttrm variability should not be understood as expressing departure in the sense of 
wandering from a fixed standard. Students sometimes g&in the impression that if the law 
of heredity were infallible all offspring would be of a common type, and that any departure 
from the type of the race, variety, or breed is to be regarded as by so much a failure of hered 
ity and a concession to variation. 

The truth is that all transmission is heterogeneous in the sense that the individuals of 
any race, whether parents or offspring belong not to a fixed type but to a frequency distri- 
bution similar to the one now under discuss-on, and the idea of type arises out of the dis- 
tribution. 

The chief conception to rest in the mind of the breeder is that whatever the parentage, 
the offspring will constitute a distribution extending through a considerable range, and that 
the parent itself also belonged to and was drawn from some portion of a frequency distribu- 
tion is not very different from that of the race in general. 

Variability is therefore not the opponent of heredity but its inevitable accompaniment 
in transmission and our problem is to devis-e methods of accurately measuring and express- 
ing its range and extent in any particular instance. 
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In the study of variability it is worse than useless to study a few 
scattered individuals here and there. What we seek is a measure of 
what may be called the average tendency to deviate from type. 
Some individuals deviate but little, others more, and still others 
very much; and we seek a measure of this non-conformity to type. 
To find this we must study groups of individuals sufficiently large 
to be representative of their race. This brings us back to the fre- 
quency distribution and what it can teach as to variability. 

Again the concrete serves well as a medium of teaching a prin- 
ciple. In this connection we refer once more to our distribution of 
327 ears and note that every ear in the lot deviates somewhat from 
the mean of 8.83 inches. The range and extent of this deviation 
are shown in the following table, column D. 



V 


f 


D 1 


3-0 


I 


-5.83 


3-5 


o 


-5-33 


4-0 


I 


-4-83 


4-5 


o 


-4-33 


5-0 


2 


-3.83 


S-5 


3 


-3-33 


6.0 


9 


-2.83 


6-5 


8 


-2-33 


7.0 


12 


-1-83 


7-5 


19 


-1-33 


8.0 


32 


-0-83 


8-5 


40 


-0-33 


9.0 


6 7 


0.17 


9-5 


63 


0.67 


10.0 


38 


1.17 


10.5 


21 


1.67 


II.O 


8 


2.17 


11.5 


2 


2.67 


I2.O 


I 


3-17 



327 

The practical question now is to reduce this column of devia- 
tions to a single expression denoting the variability .of the popula- 
tion of which this distribution is representative. Manifestly when 
this is done the variability of this distribution can be compared di- 
rectly with that of any other distribution, and at the present or any 
future time. Two methods of procedure are possible in thus se- 
curing a kind of general expression for the average amount of 
deviation, giving rise to two similar but slightly different values; 
viz., the average deviation and the standard deviation. 

The Average Deviation. If each deviation (column D) repre- 
sented an equal number of ears this "single expression" could be 
readily derived by adding the deviations and dividing by the total 
number. But these deviations do not represent equal numbers of 
ears. The deviation 5.83, for example, represents but one ear 

'"D" indicates the deviation of the several classes from the common mean of the popu- 
lation 8.83 inches. Thus the first ear deviates the difference between 3 inches and 8.83 
inches, or ^5.83, and being below it is written with the negative sign. Also, for example, 
the 21 ears 10.5 inches long deviate 10. 5-8. 8? or 1.67 inches from the mean and beingabove 
the mean we write it with the positive sign, and similarly for other values. 
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while no less than twelve ears deviated 1.83 inches below the mean 
and two ears deviated 2.67 inches above, with others unevenly dis- 
tributed. 

Manifestly each deviation should first be multiplied by the num- 
ber of ears involved, thus -, 1 

The result of this calculation is that the to- i X 5-83 == 5-83 

tal deviation of all the 327 ears from their av- * 5-33 = - 

erage length is 318.41 inches, some above and o x 4-33 = o.oo 
some below the mean. 2 If now we divide 

. , . 3 X 3-33 9 99 

318.41 by 327, the number of ears involved, we. 9 x 2.83 = 25.47 

have 0.97+ inches, which is a good expression J >< ^.33 == 18.64 

for the average deviation of this particular I9 x 1.33 = 25.27 

population. If another variety should give a 32 X 0.83 = 26.56 

, ,, i ,, .. , 40X0.33 = 1320 

larger quotient we should conclude it to be 57 x 0.17 = 11.39 

more variable. In this manner we may reduce 63 X 0.67 = 42 21 

the variability of a whole population to a single 2 i x 1.67 = 35.07 

expression. 8 x 2.17 = 17.36 

Standard Deviation. Mathematicians have ^^5^7= 317 

another method of calculating variability. It 

differs from the one just discussed in only one 327 
detail; viz., the deviations are squared before multiplying by their 
respective frequencies, thus : 

V f D D 2 D 2 f 

3-o i -5-83 33-9889 33-9889 

3-5 o -5.33 28.4089 oo.oooo 

4.0 i -4.83 23.3289 23.3289 

4-5 o -4.33 18.7489 oo.oooo 

5.0 2 -3.83 14.6689 29.3378 

5-5 3 -3-33 11.0889 33-2667 

6.0 9 -2.83 8.0089 72.0801 

6-5 -2.33 5.4289 434312 

7-o 12 -1.83 3-3489 40.1868 

7-5 19 -1-33 1-7689 33-6091 

8.0 32 -0.83 0.6889 22.0448 

8.5 40 -0-33 0.1089 4-356o 

9.0 67 0.17 0.0289 1.9363 

9.5 63 0.67 0.4489 28.2807 

10.0 38 1.17 1.3689 52.0182 

10.5 21 1.67 2.7889 58.5669 

n.o 2.17 4.7089 37.6712 

II.5 2 2.67 . 7.1289 14.2578 

12.0 I 3.17 10.0489 10.0489 



327 538.4103 



1 When the variability is to be obtained in this way the minus sign is disregarded. 

The reader will note that this total 318.41 is exactly what would have resulted if we 
had added the deviations of each separate ear of the entire 327 measured from their average 
length, 8.83. 

3 The column marked D 2 is secured by squaring the various deviations, thus eliminat- 
ing the minus sign. For example, -5.83 X -5.83 = 33.9889, etc., etc. 

*The column marked D 2 f is obtained by multiplying the squared deviations, each by its 
respective frequency, on the same principle as before. For example, 8.0089 X 9 = 72.0801,- 
the seventh number down the last column, corresponding to the frequency 9 and the devia- 
tion -2.83. 
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Dividing 538.4103 by 327 after the manner of finding the aver- 
age deviation we have the quotient 1.6465, but as the deviations 
have all been squared during the operation it is necessary to extract 
the square root of this number in order to arrive at the units in 
which the measurements were taken. The square root of 1.6465 is 
1.28+, and this is the so-called standard deviation of the mathema- 
tician. 

Hence to find the standard deviation we have : Find the devia- 
tion of each frequency from the mean; square each deviation, and 
multiply by its corresponding frequency; add the products, divide 
by the total number of variates and extract the square root. 

Shortening the Method. The calculations just described neces- 
sarily involve large decimals. These large decimals can be avoided 
and the process of finding both the mean and the standard deviation 
can be very much shortened by assuming as a mean the nearest 
probable measurement as determined by inspection of the frequency 
distribution, and afterward applying the necessary correction. For 
example, in the present instance, we should judge by inspection that 
the mean cannot be far from 9.O 1 This we infer from the fact that 
the distribution reduces both ways from this point and quite evenly. 
Proceeding with this assumption, denoting our "guess" by G and, 
reckoning deviation provisionally from this point, we have the fol- 
lowing, using exactly the same methods as before : 2 



'The advantage of assuming this value from which to reckon deviation lies in the fact 
that it is exact and contains but one decimal, while the true mean has at least two decimal 
places, making relatively large numbers to deal with. 

*The following table will be found useful for obtaining the squares of numbers con- 
taining only two significant figures-pp or 9.9, correct to three significant figures. 

SQUARES OF NUMBERS. 





.0 


.1 


_2 


.3 


.4. 


.5 


.6 


.7 


.8 


.9 




1. 


1.00 


1.21 


1.44 


1.69 


1.96 


2.25 


2.56 


2.89 


3.24 


3.61 




2. 


4.00 


4.41 


4.84 


5.29 


5.76 


6.25 


6.76 


7.29 


7.84 


8.41 




3. 


9.00 


9.61 


10.2 


10.9 


11.6 


12.2 


13.0 


13.7 


14.4 


15.2 




4. 


16.0 


16.8 


17.6 


18.5 


19.4 


20.2 


21.2 


22.1 


23.0 


24.0 




5. 


25.0 


26.0 


27.0 


28.1 


29.2 


30.2 


31.4 


32.5 


33.6 


34.8 




6. 


36.0 


37.2 


38.4 


39.7 


41.0 


42.2 


43.6 


44.9 


46.2 


47.6 




7. 


49.0 


50.4 


51.8 


53.3 


54.8 


56.2 


57.8 


59.3 


60.8 


62.4 




8. 


64.0 


65.6 


67.2 


68.9 


70.6 


72.2 


74.0 


75.7 


77.4 


79.2 




9. 


81.0 


82.8 


84.6 


86.5 


88.4 


90.2 


92.2 


94.1 


96.0 


98.0 





10 
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V 
3-0 
3-5 
4.0 

4-5 
5-0 
5-5 
6.0 

6-5 
7-0 
7-5 
8.0 

8.5 

9-0 

9-5 

10.0 

10.5 

II.O 

ii-S 

I2.O 



f 
I 

O 
I 



2 

3 
9 
8 

12 
19 
32 
40 
67 
63 
38 
21 

8 

2 
I 

327 



V-G 


f (V-G) 


(V-G) 2 


f(V-G) 2 


-6 


-6.0 


36.00 


36.00 


-5-5 


0.0 


30.25 


00.00 


-5-0 


-5-o 


25.00 


25.00 


-4-5 


o.o 


20.25 


oo.oo 


-4.0 


-8.0 


16.00 


32.00 


-3-5 


-10.5 


12.25 


36.75 


-3-0 


-27.0 


9.00 


81.00 


-2-5 


-20.0 


6.25 


50.00 


-2.0 


-24.O 


4.00 


48.00 


-1-5 


-28.5 


2.25 


42.75 


-I.O 


-32.0 


I.OO 


32.00 


-0.5 


-20.0 


0.25 


10.00 


o.o 




-181.0 o.oo 


oo.oo 


0.5 


31-5 


0.25 


15.75 


I.O 


38.0 


I.OO 


38.00 


ifS 


31-5 


2.25 


47.25 


2.0 


16.0 


4.00 


32.00 


2-5 


5-0 


6.25 


12.50 


3-0 


3 


125.0 9.00 


9.00 



Difference 56.0 



548.00 



This method gives us- both the mean 'and standard deviation. 
Considering first the mean: In column f(V-G) we find that after 
multiplying the deviations from our assumed mean (9.0) by their 
respective frequencies, the sum of the negative products (-181.0) 
exceeds the sum of the positive products (125.0) by 56.0; that is 
the algebraic sum of the products is -56.0. Our assumed mean is 
therefore too high by the amount of -56.0 -f- 327* = -0.171. We 
then reduce our assumed mean by this amount (9.0 - 0.171 = 
8.829) and arrive at the true mean 8.83~. 2 

Considering next the standard deviation:. In column f(V-G) 2 
we have 548.00 as the sum of the products of the several frequencies 
into their respective deviations from. the assumed mean, derived on 
the same plan as when working from the true mean D. 

Dividing by the total number (327) we have 548.00 -f- 327 = 
1.6758, corresponding to the quotient, 538.4103 -f- 327 = 1.6465 
of the previous calculation when working from the true mean. 

The correction made in the mean was -0.171, but as we are now 
dealing with the second powers it seems but natural that this amount 
be squared before it be taken from the quotient i:6/58 3 . The 
square of -0.171 is 0.029241 or 0.0292 +. We have therefore after 
applying this correction 1.6758-0.0292 + =1.6466. 



'We divide by the total number (327) because we are dealing with a column of pro- 
ducts arising from the introduction of the frequencies. 

2 On the other hand should the sum of the positive deviations exceed the sum of the 
negative deviations it would indicate that our assumed value is too small and we should add 
the correction in order to arrive at the true mean. 



case 



3 This can be justified by a strictly mathematical proof. It is to be noted that in the 
of standard deviations the square of the correction is always to be subtracted. 
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This agrees very nearly with the value 1.6465 previously found, 
but the shorter method is the more accurate because no decimals 
have been lost during the process. The square root of 1.6466 is 
1.28+, the standard deviation sought, agreeing with the former value 
and derived by a very much shorter method. The first method is 
useful for expounding the principles involved but the latter is far 
preferable for actual use, not only on account of its brevity but its 
increased accuracy as well. 

The farmer is at liberty of course to choose whether he will use 
the average deviation or the standard deviation as an index of vari- 
ability. The average deviation is the simpler, but it is seldom used 
by mathematicians. As the results are different, generally smaller, 
they cannot be compared with those found in standard literature of 
this kind. 

The standard deviation, obtained by one of the two latter meth- 
ods, is strongly recommended. It is the one that will be used in all 
publications of this station. The breeder may employ either the 
shorter, or the longer and slightly less accurate method. The 
shorter method is far more convenient and is no more complicated 
except in making the correction and this, after a little practice, of- 
fers no difficulty. 

Practical Value of Standard Deviation. The standard devia- 
tion is a good measure of deviation from the mean. It is therefore 
a good measure of variability reckoned from that point. It is mani- 
fest that by the same methods we could calculate the deviation and 
express the variability from the mode, the selection standard, or any 
other type on which the mind might rest. 

The practical value of standard deviation is that it stands as a 
definite measure of variability of the population in question, and if 
records be kept the variability of any race may be compared from 
year to year. The advantage of being able to make comparisons of 
this sort is too obvious to require elaboration. 

Coefficient of Variability. It is often desirable to compare the 
variabilities of different characters measured in different units either 
within the same race or between separate races ; thus, which is 
more variable, the length, the circumference or the weight of ear? 
In such cases one standard deviation cannot be compared directly 
with another for two reasons : First one mean is very much larger 
than another, and second, they are of entirely different units, as 
inches and pounds, in which cases direct comparison is impossible. 
We seek, therefore, an abstract expression combining the idea both 
of. standard deviation and type. Such an expression is known as 
the coefficient of variabilitv and is found as follows : Divide the 
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standard deviation by the mean as a base and the result will be an 
excellent index of variability in the form of a rate percent. 

Thus for the case in question we have: 1.28 -~ 883 = 0.145-, 
indicating the variability of this population to be over 14.5 percent 
of its own mean. Here we have a mathematical expression for com- 
paring variability on an abstract basis, and by this means we can 
compare the variability of this population with that of any other 
from any race, plant or animal; and for any character of which 
accurate measurements can be made. 

For example, the coefficient of variability has been worked out 
for a large number of characters in man as is shown in the follow- 
ing table : ' 

Nose length 9.49 Head length 2.44 

breadth 7-57 breadth 2.78 

" height 15.2 Upper arm length 6.50 

Forehead height 10.4 Forearm 3.85 

Underjaw length 4.81 Upper leg 5.00 

Mouth breadth 5.18 Lower leg 5.04 

Foot 5.92 

From this we note that the most variable character in these phy- 
sical measurements of man is the height of the nose from the plane 
of the face (15.2) and this is the only character that is as variable 
as is length of ear in the distribution now under discussion (14.5 
percent). It is manifest that the variability of the nose in man or 
of the weight of animals could not be directly compared one with 
another, because the units are different and because they are reck- 
oned on different means, but when variability is reduced to a co- 
efficient then direct comparison becomes entirely possible and intel- 
ligible. 

Practical Use of Statistical Constants. The practical advantage 
to the breeder in being able to calculate the mode, mean, and varia- 
bility of the animals and plants he is breeding, and thus to know 
definitely their behavior from generation to generation under his 
methods of selection and treatment all this is too obvious to need 
discussion. Breeding operations in the past have lacked much in 
definiteness because of the inability of breeders to possess them- 
selves of this class of knowledge or even to appreciate its bearing 
upon breeding operations. The successful breeder of the future will 
be a statistician and a bookkeeper. He will keep himself as ac- 
curately and as fully informed as may be as to the type and varia- 
bility in succeeding generations of the breeds and strains he attempts 
to improve, and he will know this of all important characters that 
can be subjected to any form of measurement. 

'See Var. in An. and PI. Vernon, p. 24. 
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Manifestly the methods here given do not avail with characters 
that cannot be subjected to measurement, nor can they be employed 
when it is impossible to find sufficient numbers to make the calcula- 
tions reliable. 

The characters that can be classified and measured, at least ap- 
proximately, are however, more numerous than might at first 
thought seem possible. Dimensions and weights are in most cases 
easily taken. Gains in weight, yield of milk, rate of speed, etc., are 
readily handled by the statistical methods, and even such characters 
as color, degree of intelligence, and the like, are not impossible of 
classification and approximate measurement. 

While most characters can thus be brought into the form for 
statistical treatment it is useful to know that present day knowledge 
.of breeding operations seem to indicate that all characters, whether 
meaSiireable or not, tend to behave after the same general principles 
as to type and variability, so that we may confidently believe that 
every character of every individual belongs in some portion of a 
distribution whether the distribution could or could not be definitely 
written. 

PROBABLE ERROR 

Clearly no calculations based on a portion of the population can 
represent the entire race with absolute accuracv. If one more ear 
had been measured it would have fallen somewhere in the scheme 
of distribution, and wherever it may have fallen it would have 
slightly changed our calculations. 

When we are able to examine all the individuals involved in a 
problem we can of course determine absolute values to within the 
limits of measurement, as in the average weight of a bunch of steers 
or the yield per acre of a field of grain. Our present discussion, 
however, is of a class of problems in which we can never hope to see 
and examine more than a fraction of the total population, as when 
we ask what is the average weight of steers the country over, or 
the average length or weight of ears of corn. 

In practice when dealing with this class of problems we can do 
no better than to take a random sample and assume it to be repre- 
sentative of the entire population, accepting whatever error may be 
involved, and there is always an error of some magnitude, for no 
random sample can be assumed as being completely representative 
of the entire race to which it belongs. 

Now no method can inform us as to the exact magnitude of this 
error. If it could we should at once correct for it and thus come 
into possession of the true value ; but methods are known by which 
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we may judge fairly well of the degree of confidence which may 
be reposed in results of this kind. These methods result in deducing 
what is called the "probable error," written plus or minus E ( E). 
The formulas for calculating E are derived by mathematical 
methods which are too complicated for discussion here, but which 
are briefly stated in the appendix, with the following results : 

i. Probable Error of Mean. The formula for probable error in 

Standard deviation 

determinations for mean is E = \/~n~ X -O745> 

in which n is the number of variates examined and 0.6745 is a 
mathematical constant. In words, it is, Multiply the standard 
deviation by 0.6745 and divide by the square root of the number 
of variates examined. Thus in the present instance by substituting 
values for standard deviation and number we have E = 
x 0.6745 = 0.047. This probable 




but a small fraction of the value determined (j^a&f*and indicates 
that a high degree of confidence can be placed in the result. If, 
however, another calculation should show a smaller value for prob- 
able error we should conclude that a still higher degree of confi- 
dence could be placed in its accuracy. The reader will note that the 
number (n) is the only element of the formula that is under our 
control, others arising necessarily out of the problem. He will note, 
too, the overwhelming influence of numbers ; that as numbers in- 
crease the denominator increases and probable error decreases, and 
that when the number should reach infinity, E would become zero. 

2. Probable Error of Standard Deviation. The formula for 
probable error in determinations for standard deviation is, E 

Standard deviation 

I/-25- < :6 745. 

In words, it is, Multiply the standard deviation by 0.6745 and 
divide by the square root of tivice the number examined. Substi- 
tuting values for the case in hand we have as the probable error of 

1 28 

the standard deviation 1.28, E = 77^^ X 0-6745 =0.034. 

3.- Probable Error of Coefficient of Variability. The formula 

for probable error in determinations for coefficient of variability be- 

Coefficient of Variability 

low 10 percent is, E = j/2^~ ~ X 0.6745. 

In words, it is, Multiply the coefficient of variability by 
0.6745 and divide by the square root of twice the number. In the 
case in hand, however, the coefficient of variability (14.5) is greater 
than ten percent and in such cases the following formula is used : 

E = 0.6745 j7== [l + 2 (wVl* which e( l uals -39 as the 
corrected probable error. 
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Meaning of Probable Error. It is important that the meaning 
of probable error be not misunderstood. It has no reference to er- 
rors in our computations, which are assumed to be correct. It is 
not the actual magnitude of errors made nor is it the most probable 
size of any mistake, neither does it set the limits within which errors 
must lie. Such limits cannot be set, but it does mean that the 
chances are even that the true value lies within the range set by E ; 
that is, if the determination (ac in mean) be 8*&Land the E be 
0.02 then the chances are even that the true 
than 7.23 (7.25-0.02) nor greater than 7.27 (7.25+0.02). 

Of course the chances are also even that the true value may lie 
outside this range but these chances rapidly decrease as we increase 
the range. Thus the chances against the true value lying outside of 
twice the probable error are as 4.5 to I. The following table shows 
the rapid increase in the chances that the true value lies within the 
range set by E, 2E, etc. They are as follows :* 

E, the chances are even. 
2 E, 




3 E, 

4 E, 

5 E, 

6 E, 

7 E, 

8 E 

9 E, 



4.5 to I. 
21. to i. 
142 to I. 
1310 to I. 
19200 to I. 
420,000 to I. 
17,000,000 'to I. 
about a billion to one. 



It will be noticed that by the time we have made an allowance of 
three or four times the probable error we have reached a chance 
which amounts to practical certainty, and even 21 to I involves far 
less chance than is involved in most business transactions. 

Degree of Confidence Shoivn by Probable Error. There is a 
popular notion that most affairs of life rest on a positive basis of 
fact and that unless error or chance can be entirely eliminated from 
our calculations no confidence is to be placed in the results. This is 
erroneous. A large element of uncertainty is nearly always involved 
in all affairs of life whether we recognize the fact or not. 

Problems of the class now under discussion differ from 'ordinary 
affairs of life, therefore, only in this, that we can calculate the prob- 
able error involved and by that determine the degree of confidence 
that may be reposed in the conclusions. If the probable error is 
large as compared with the determination then we have nothing but 
a shrewd guess unless we increase the numbers, but if the probable 
error is small as compared with the determination then a high de- 
gree of confidence may be placed in the results and the facts taught 
may be relied upon as practically certain. 

'C. B. Davenport, Statistical Methods, p. 14. 
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For a graphic illustration of the meaning of =b E : Suppose in 
the following figure the line AB represents our determination and 
the lines ab and a 1 !) 1 represent +E and -E as follows : a A al 



b B b 

Now this means that the chances are even that the line which 

would represent the true value is not outside the limits set by the 

lines ab and a. 1 !)- 1 ; 'representing --E. a, a A a 1 a* 

If. now "we set other lines at 2 E as follows, 

t - b, i> B b 1 b 2 

then we know from the table that the chances are 4.5 to one that 
the line which represents the true value is not outside the lines a l b l 
and a 2 b 2 , each removed twice the probable error from the deter- 
mination. 

The first of the two purposes of this bulletin is accomplished in 
the text up to this point and in the appendix. It remains to present 
certain data showing how type and variability of corn behave under 
various influences. 

INFLUENCE OF SELECTION UPON TYPE AND 
VARIABILITY. 

In 1896 Dr. C. G. Hopkins, chemist of the Experiment Station, 
began a series of breeding experiments to determine whether the 
chemical composition of corn could be influenced by selection. The 
mass of data which has accumulated during the ten years of the 
investigation affords some of the most reliable information that has 
ever been secured concerning the influence of selection upon both 
type and variability. 

The selections were made in four directions ; namely, for high 
oil, for low oil, for high protein, and for low protein, giving rise 
to four strains, known as Illinois' high-oil, Illinois' low-oil, Illinois' 
high-protein, and Illinois' low-protein. All four strains sprung 
from the same original stock of 163 good ears of a local strain, 
known as Burr's White. 

Table I, exhibits the distribution of the original 163 ears and 
the effects of ten years of selection for oil content. The column 
headed "Seed" gives the average of the seed ears planted both for 
high oil and for low oil, and the distributions opposite show the 
actual oil content of the entire crop of good ears raised therefrom. 1 
The column headed "Average" gives the average content of oil as 



J These different strains were raised in small isolated plots, each ear in a separate row. 
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determined by chemical analysis of the actual ears, differing slightly 
from what would be the computed mean of these distributions be- 
cause in the distribution the ears are made to fall in definite classes. 

Nothing can exhibit more clearly than, this table the readiness 
with which the type responds to selection. In the fourth crop the 
high oil and low oil strains parted company, that is to say, their 
distributions no longer overlapped ; in the seventh year the entire 
low oil crop dropped below the lowest ear of the original stock, and 
in the ninth the entire high oil distribution was above the highest 
ear of the original stock. That is to say, the two strains of high 
and low oil, though developed from the same stock, had separated by 
a space wider than that covered by the original distribution or that 
of either strain. Both strains had entirely departed from the space 
occupied by the original stock and the mean oil content of the high 
oil corn was nearly twice that of the foundation, or almost three 
times that of the low oil strain. 

This illustrates the principle of progression as it is illustrated 
by no other definite data known to the writers. The distributions 
began from the very first to separate, and within four years the sep- 
aration was complete. Not only are these facts clearly established 
but the separation continued and all the distributions are normal; 
that is they slope both ways from a maximum that is not far from 
the middle point. The plain conclusion is that response to selection 
is rapid and pronounced and by persistent selection the type may be 
carried entirely beyond the former limits of the race. 

Table 2 gives in condensed form the effect of selection upon the 
mean, standard deviation, and coefficient of variability for all four 
strains of corn and through the ten years of the experiment. A 
glance at this table will show that the high and low protein strains 
followed the same general principle as indicated by the high and low 
oil strains. It will be noted, however, that the response to selection 
was upon the whole least prompt in the low protein corn. 
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TABUS 2. VARIABILITY OF CORN BRED FOR OIL AND PROTEIN 



VH C 


*& 


Percent Protein in Original Stock, 163 Ears. 
Mean, Standard Deviation, Coefficient of Variability, 
10.930.05 1.040 04 9.50+0.35 


Percent protein in 
High protein selection. 


Percent protein in 
I/ow protein selection. 


Mean. 


Standard 
Deviation. 


Coefficient of 
Variability. 


Mean. 


Standard 
Deviation. 


Coefficient of 
Variability. 


1897 
1898 
1899 


10.99-i-0.07 
10.98-4-0.05 
11 62 0.06 


1.16-f-O.OS 
1.22-1-0.04 
1.28+0.04 


10. 90-1-0.50 
ll.15-f-0.33 
11.000.36 


10.49-f-0.08 
9.59+0.06 


1.32+0.06 
1.01+0.04 


12.61+0.53 
10.50+0.42 


1900 
1901 
1902 


12. 62 -HO. 05 
13.78-J-0.07 
12.900.08 


1.02+0.03 
1.17-f-0.05 
1.100.05 


8.09-1-0.26 
8.48-1-0.38 
8 500.43 


9.13-f-0.06 
9.63+0.07 
7.86+0.05 


1.04+0.04 
1.10+0.05 

0.75+0.04 


11.34+0.45 
11.47+0.49 
9.60+0.48 


1903 
1904 
1905 
1906 


13.5lH-0.09 
15.04+0.10 
14.71-f-0.08 
14.25-f-0.08 


1.36-1-0.06 
1.34-1-0.07 
1.24-f-0.05 
1.33-f-0.06 


10.04-t-0.48 
8.94-f-0.43 
8.84+0.42 
9.33-1-0.41 


8.00+0.06 
8.17+0.06 
8.55+0.06 
8.66+0.06 


0.83+0.04 
0.86+0.04 
1.05+0.05 
0.93+0.04 


10.41+0.50 
10.55+0.51 
12.24+0.54 
10.77+0.47 



* s 

rt 

O O 

x& 


Percent Oil in Original Stock, 163 Ears. 
Mean, Standard Deviation, Coefficient of Variability, 
4.68+0.02 0.41+0.02 8.83+0.33 


Percent oil in 
high oil selection. 


Percent oil in 
low oil selection. 


Mean. 


Standard 
Deviation. 


Coefficient of 
Variability. 


Mean. 


Standard 
Deviation. 


Coefficient of 
Variability. 


1897 
1898 
1899 


4.79+0.03 
5.10+0.02 
5.65+0.03 


0.38+0.02 
0.48+0.02 
0.42+0.02 


7.87+0 42 
9.33+0.30 
7.47+0.34 


3.95+0.02 

3 85+0.02 


0.32+0.01 
0.32+0.01 


8.13+0.37 
8.42+0.33 


1900 
1901 
1902 


6.10+0.03 
6.24+0.03 
6.25+0.04 


0.44+0.02 
0.45+0.02 
0.50+0.03 


7.26+0.33 
7.26+0.31 
8.06+0.41 


3.57+0.02 
3.45+0.02 
3.00+0.02 


0.36+0.01 
0.26+0.01 
0.32+0.02 


10.13+0.40 
7.59+0.32 
10.84+0.55 


1903 
1904 
1905 
1906 


6.51+0.03 
7.12+0 04 
7.30+0.03 
7.37+0.03 


0.46+0.02 

0.58+0.03 
0.55+0.02 
0.45+0.02 


7.07+0.34 
8.19+0.39 
7.47+0.33 
6.15+0.27 


2.99+0.02 
2.91+0 02 
2.56+0.02 
2.66+0.02 


0.23+0.01 
0.25+0.01 
0.28+0.01 
0.31+0.01 


7.83+0.39 
8.45+0.40 
10.86+0.55 
11.69+0.51 



This table shows the effects of selection. It will be noted that 
the means steadily increase or decrease with selection, but that the 
variability coefficient of variability does not greatly change. This 
all shows that the effect of selection is to shift the type without 
sensibly reducing variability. 

The chief interest in Table 2 is with respect to variability as 
shown by the standard deviations or better yet by the coefficients 
of variability of the different strains. On this point one fact is 
clear cut and significant; namely, while the different strains differ 
as to variability, the high oil upon the whole being least variable 
and the low protein most variable, yet in every instance the varia- 
bility was not sensibly reduced during the ten years of rigid selec- 
tion. 

True, it fluctuates from year to year but rarely more than is ac- 
counted for by the probable error, and it cannot be said from these 
figures that the effect of selection is greatly to reduce variability. 
This agrees with other modern studies in the field of breeding and 
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in connection with the data given in Table i, tends strongly to con- 
firm the statement that in general the effect of selection is to shift 
the type without greatly altering variability. All of this means that 
after great improvement has been secured there is still left abundant 
variability on which to base future selection, and that if the limits 
of improvement are ever reached it will be for some reason other 
than the failure of variability. 

INDIRECT EFFECTS OF SELECTION 

Table 3, gives the physical characters of these four strains of 
corn for two separate crops, 1905 and 1906. By this we see that 
the effect of selecting for chemical content has been also to alter the 
physical characters of the different strains ; that is to say, while the 
ears differed in the two years, 1905 and 1906, yet in both cases the 
low protein strain had the longest and the high oil strain the short- 
est ears ; the low oil corn had the largest and the high protein the 
smallest circumference; the low oil corn was the heaviest in both 
years and the high protein the lightest. Aside from this there seems 
to be a tendency to affect the number of rows. In any event in both 
these years the high oil corn had the largest number of rows and 
the low oil corn the smallest number of rows. While these two 
years are not enough to determine whether these differences will re- 
main permanent with these strains, the data presented are certainly 
sufficient to show that these four strains of corn are coming to differ 
decidedly in respect to physical characters though not as widely as 
they differ in their chemical composition, for which they were se- 
lected. 



TABLE 3. EFFECT OF SELECTION FOR CHEMICAL CONTENT UPON PHYSICAL 
CHARACTERS OF CORN 

L/ENGTH OF BAR. 



Chemical 
strain. 


Crop of 1905. 


Crop of 1906. 


Mean. 


Standard 
deviation. 


Coefficient 
of 
variability. 


Mean. 


Standard 
deviation. 


Coefficient 
of 
variability. 


ligh Protein 


7.210.04 


1.27+0 03 


17.6+0.4 


7.8800.043 


1.201+0 030 


15.24+0.39 


vow Protein . 


7.800.04 


1.540.03 


19.7+0.4 


8.841+0.050 


1.3640.035 


15. 43+0. 4S 


lighOil 


6.870.04 


1.390.03 


20.20.4 


7.6060.042 


1.014+0.030 


13.330 41 


vow Oil 


7.48+0.04 


1.30+0.03 


17.40.4 


8.3780.056 


1 5100.040 


18 02+0.49- 
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Chemical 
strain. 


Cropjjf 1905. 


Crop of 1906. 


Mean. 


Standard 
deviation. 


Coefficient 
of 
variabilit}^. 


Mean. 


Standard 
deviation. 


Coefficient 
of 
variability. 


High Protein 


5 760.01 


0.440 X 01 


7.60.2 


5. 863 0.014 


0.3930.010 


6.700.17 


Low Protein. 


6.510.02 


0.610.01 


9.40.2 


6. 495 0.018 


0.4780.013 


7.360 19 


High Oil 
Low Oil 


6.050.01 
6.650.02 


0.530.01 
0.590.01 


8.80.2 
8.90 2 


6 1340.017 
6.7170.018 


4170.012 
0.5150.013 


6.800.20 
7.520.19 





WEIGHT OF EAR. 



Chemical 
strain. 


Crop of 1905. 


Crop of 1906. 


Mean. 


Standard 
deviation. 


Coefficient 
of 
variability. 


Mean. 


Standard 
deviation. 


Coefficient 
of 
variability. 


High Protein 
Low Protein. 
High Oil. 


'7.530.04 
9.660.10 

7.790 07 
9 840.08 


2.500.03 
3.300.07 
2.430.05 
2.870.06 


33.20.4 
34.20.7 
31.20.6 

29.20.7 


8.2890.081 
10.770.11 
8.8500.075 
11.500.13 


2.0610.057 
2.6800.071 
1.7630.053 
3.3490.091 


24.860 72 
24.880.74 
19.920.63 
29.130.84 


Low Oil 





NUMBER OF Rows IN EAR. 



Chemical 
strain. 


Crop of 1905. 


Crop of 1906. 


Mean. 


Standard 
deviation. 


Coefficient 
of 
variability. 


Mean. 


Standard 
deviation. 


Coefficient 
of 
variability. 


High Protein 


13.720 03 


1.850.02 


13.50 2 


13.7740.060 


1.7070.042 


12.390.31 


Low Protein.' 


14.170.06 


1 940.04 


13 70 3 


14 5970.072 


1 9220.051 


13.170 35 


High Oil 


15.650.06 


2 080.04 


13.30 3 


14.7120.073 


1.802 0.052 


12.250.36 


Low Oil 


12.800.05 


1.770.04 


13.80.3 


13.3100 067 


1 8970.047 


14.250.38 
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EFFECT OF FERTILITY UPON TYPE AND 
VARIABILITY OF CORN 

Upon this point the experiments of Dr. Hopkins upon soil fer- 
tility afford considerable information. Tables 4 and 5. 
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Table 4 shows the fertility treatment, the yield and the physical 
characters of ear on a series of plots (201-210) under a three-year 
rotation of corn, oats and clover. All the plots were given a uni- 
form stand of two stalks to the hill except plot 210 which had three 
stalks per hill. Under the system of planting therefore the yield 
and weight of ear of necessity moved together, but the length and 
circumference are free to vary somewhat independently. By consult- 
ing the columns of means we note that in general the ears_were 
both longer and larger in the higher yields, except in the thicker 
planting where the advantage of increased numbers was in a meas- 
ure offset by a decrease in size of ear. In general these figures show 
that increased fertility results in increase in both length and circum- 
ference and at a rate fairly uniform with each other and with the 
increased yield. As would be expected fertility has no effect upon 
the number of rows. 

The greater variability of plot 210 is noticeable at once with re- 
spect to both length and circumference and therefore to weight. 
Whether this is due to thickness of planting, to injury, or to the in- 
creased fertility is at first a matter of doubt. Referring to Table 5, 
two years' rotation with the same system of fertilizing as before, 
we find the same increased variability in the thickly planted plot 
(510), which was not injured, though it is less evident with respect 
to circumference than to length. This apparently confines the cause 
either to increased fertility or to thickness of planting. That it is 
not wholly or principally due to excessive fertility is likely from the 
fact that the only plot whose variability approaches that of 210 
(Table 4) is plot 200 which is unfertilized. This shows that the in- 
creased variability of plots 210 and 510 is due in part at least to 
thickness of planting, which seems to affect the length of ear rather 
more than the circumference. 
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APPENDIX 
BY H. L,. RIETZ. 
GRAPHIC REPRESENTATION OF STATISTICS 

A mere tabulation of any considerable number of figures does not make it 
possible, in general, for the mind to grasp the main facts which the figures rep- 
resent; in fact, the tabulation of one thousand figures may leave no impression 
on the mind. By the graphic methods the chief characteristics of a mass of fig- 
ures are presented to the eye by means of a picture or curve. The graph gives, 
at a glance, important features which may be overlooked or which can only be 
obtained from the figures by considerable labor. 

The use of the graphic method in statistical work is very extensive, and 
every student who has to deal with complex groups of figures appreciates more 
and more this method as it enables him to perceive relations through the eye. 
It is the object of this section to show how frequency graphs are obtained from 
given data. 

Frequency Curves. Let us consider the following frequency distribution : 
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in which the first line of this table gives the class marks, and the second line the 
corresponding frequencies ; what we propose to do here is to present a signifi- 
cant picture of this frequency distribution. For the benefit of those who find 
it easier to follow a concrete example, it may be said that this frequency distri- 
bution was not taken arbitrarily but actually represents the result of measuring 
the lengths of 800 ears of corn as a random sample taken from a large group, 
the class marks representing inches. 
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Draw two lines OX and OY at right angles to each other. These are called 
coordinate axes. The line OX is called the x-axis and OY the y-axis. Beginning 
at O imagine equal intervals marked off along the x-axis. In the particular 
case in hand the numbers under OX indicate the number of inches represented 
by the distance from O, and the number on the left hand side of OY indicate 
the frequency represented by the distance from O. The question as to what each 
division represents is a matter of scale, and the scale must be chosen to suit the 
particular problem in hand. These numbers may also be so chosen as to be 
class marks. At each class mark construct a perpendicular of such length, meas- 
ured upward from the x-axis, as to represent the frequency corresponding to 
the given class. (See dotted lines in Fig. I.) The straight lines P,, P 2 , P 2 , P 3 , 

P 3 P 4 , P 12 andP 18 joining the upper extremities of the perpendiculars so 

constructed form what is called "the frequency polygon" of the given distribution. 
If a smooth curve were drawn as nearly as possible to the points P,, P 2 , P 3 ,.. 
and P 13 , (through them if possible), the curve would resemble the curve in 
Fig. 2 and is called the frequency curve of the given distribution. 

Any point such as P (Fig. i) represents two numbers. The one number is 
represented by its distance from the y-axis and the other by its distance from 
the x-axis. The numbers which are represented by the distances of P from the 
y-axis and x-axis are called the abscissa and ordinate of P respectively. We 
may well think of each of the points which determine the frequency polygon as 
having an abscissa and ordinate which give the position of the point. 

Significance of Area Under Curve. Construct rectangles such as 
ABCD, and BCEF on the ordinates at class marks as midlines so that the 
sides AD, BC 1 , FE, etc., set the boundaries to measurements belonging to dif- 
ferent classes. Suppose now that we define unit area as equal to a rectangle 
bounded by AB, AD, BC, and a line parallel to AB, and at the distance from 
AB which represents one unit in 'frequency. Then the area of ABCD is no 
and the area of all such rectangles together represents the total population. 

In drawing the frequency curve, one guide is to make the area bounded by 
the curve, the x-axis, and two ordinates equal to the sum of the areas of rec- 
tangles between the same ordinates. Then the total area under the curve repre- 
sents the total population. We shall see the importance of this representation 
of the population by an area in connection with the discussion of probable error. 

Choice of Scale. The question always arises in drawing a graph as to 
what scale is to be used in plotting, and unfortunately no definite rule can be 
laid down. Attention may, however, be called to a few points. First, we should 
choose a scale such that all the points can be plotted on the paper used ; for, the 
purpose o-f the graph may be defeated if it is not visible in its entirety. Secnodly, 
if the point involved in the investigation is a queston of rate of increase or de- 
crease, we should select such a scale as to make the curve reasonably steep. 
To illustrate, the sociologist presents the population of a city or country for 
successive years by a frequency curve in which years are used as class marks, 
and the corresponding populations are used as ordinates. In this case, the steep- 
ness should give at a glance.a good idea of the rate of change of the population. 
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Application of the Theory of Probability. What is commonly known as a law of 
nature is a generalization based upon experience. Such a law can be established 
only in the sense that a high probability may be shown to exist in its favor. 
To illustrate, we may take one of the best established laws of science; namely, 
that all bodies are attracted by the earth. The evidence for this statement consists 
in the fact that of the thousands and even millions of bodies which have been 
observed, they have followed this rule without an exception. This has established 
a very high degree of probability in favor of the generalization. It is altogether 
conceivable, however, that some body is repelled by the earth, and that such a 
body will at some time be observed. Although experience has established an over- 
whelming probability against such an occurrence, we must not overlook the fact 
that experience establishes a law of nature only in the sense of establishing a high 
degree of probability in its favor. If 1000 pennies be tossed at random, there is 
nothing more uncertain than that a given penny will turn up heads, but it is a 
matter of common experience that the ratio of the number of heads to the total 
number of pennies tossed is, in general, nearly V-z, and that, in general, this ratio 
is more likely to approximate H as the number of tossings is increased. 

A proper view of the theory of probability is especially needed in statistical 
work because we deal with occurrences, and characters of such a nature that we 
wish to make statements in regard to large numbers of them taken together. It 
is a matter of common experience that results, such as averages, and ratios, ob- 
tained from large numbers are nearly stationary. We find the average length 
of 1000 ears of corn as a random sample taken from a larger population, and 
are surprised if, upon taking another random sample of 1,000 ears from the 
same larger population, their average differs materially from that already found. 
We are not at all surprised if they come out substantially alike. There are prob- 
ably many causes which influence the growth of each single ear, but when they 
are all taken together, these small disturbances tend to counterbalance each 
other. In short, it is regularity in large numbers which we expect. While it 
is common sense to expect this, we shall later give a mathematical measure 
known as the probable error to indicate what deviations we should expect in 
results such as averages. 

This discussion leads us to the following definition of probability: 

If, in the long run, out of n possible cases in each of which some event (or 
character) occurs, or fails to occur, it occurs n, times, and fails n-n t times, the 

probability that the event occurs on a particular occasion in question is , and 
the probability that it fails to occur is ' 

Hence the expression "relative frequency" is a significant equivalent for 
"probability." In framing this definition we idealize an actual experience. When 
we say the probability of a penny falling heads is V 2 , this may be looked upon 
as an answer to the following question : What part of the pennies tossed should 
we expect to find heads up if we should toss an indefinitely large number? This 
idealization for purposes of definition is analogous to the idealization of the 
crude chalk mark into the straight line of geometry. Since the sum of the 
probabilities of occurrence and failure on a particular occasion is Hj _j- til 1, 
the number I is the symbol for certainty. 

We shall state the following corollarv to the definition as it is sometimes 
easier to apply : 

If all the cases in which an event is in question can be analyzed into n 1 cases, 
each of which is equally likely, and m l is the number of these cases in which the 
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event occurs then ^-is the probability that the event will occur on the occasion 
in question. 

For instance, in tossing two pennies, what is the probability of one head 
and one tail? 

The four different ways in which the pennies can fall are : Heads-tails, 
tails-heads, heads-heads, tails-tails. Two of these lead to the occurrence 
of the event, and the four are equally likely. Hence f i is the required 
probability. 

Combination of Probabilities. The probability that all of a set of inde- 
pendent events ivill occur on an occasion in which all of them are in question is 
the product of the probabilities of the separate events. 

Proof: Let p,, p 2 , . . . . p r be the separate probabilities of r events. Out 
of a great number n of cases, the first will happen on p,n occasion. Out of 
these the second will happen on p z (pjn) occasions. Continuing this process 
and applying the definition above, the theorem follows. To illustrate the theorem, 
suppose that among a population of 100,000 people, 30,000 are vaccinated, and 
that 500 persons have smallpox. If vaccination has no influence on smallpox, 
what is the probability that a person is both vaccinated and has smallpox? 

Since 100,000 is a large number, we may give xVoVoo i 3 TJ as the prob- 
ability that a person is vaccinated, and To 5 ooo7y = loo as the probability 
that a person has smallpox. Then ^ X *Jo = 50015 1S the probability 
that a person is both vaccinated and has smallpox if vaccination has no influence 
on smallpox. Furthermore, ^o^o X 100000 = 150 is the most probable num- 
ber of vaccinated persons we should expect to have smallpox if vaccination has 
influence on the number of cases of smallpox. 

The Normal Distribution. If 4 pennies are thrown at random, what is 
the probability that exactly r of them are heads and the rest tails when r 
takes values o, I, 2, 3, 4? 

i) ( 1 AY = probability of o heads and 4 tails. 
= probability of I head and 3 tails. 
= probability of 2 heads and 2 tails. 
= probability of 3 heads and i tail. 

5) (^2)* = probability of 4 heads and o tails. 

In 2) the coefficient 4 appears before ( 1 A) 4 because with 4 coins there are 
four different combinations possible each consisting of i head and 3 tails. For 
similar reasons the coefficients 6 and 4 appear in 3) and 4) respectively. 

The above illustration may be generalized and the result may be put into 
the following form : If n coins are thrown upon a table at random, the proba- 
bility that exactly r of them are heads, and the rest tails, is given by the r+i st 
term of the binomial expansion (> + %} n 

That is, in other symbols n c r (/^) n 

where the symbol "c,.. indicates the number of combinations of n things taken 
r at a time. In order to emphasize the fact that there is a much greater proba- 
bility of getting an almost equal number of heads and tails than of getting widely 
different numbers, and in order to lead up to the normal probability curve, we 
present the following table for n = 999 obtained from Quetelet's Lettres sur la 
Theorie des Probabilites. As indicated in the table, columns I and 2 give the 
number of heads and tails whose probabilities are in question, while column 3 
gives the corresponding probabilities : 
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1 


2 


3 


1 


2 


3 


499 


500 


0.025225 


450 


549 


0.0001863 


490 


509 


0.021069 


440 


559 


0.0000209 


480 


519 


0.011794 


430 


569 


0.0000016 


470 


529 


0.004423 


420 


579 


0.00000004 


460 


539 


0.001110 









It may be observed from this table that in the long run one should expect 
499 heads and 500 tails more than 600,000 times as often as 420 heads and 579 
tails. 

In Figure 2 the results in this table are presented graphical by taking as 
class marks the various combinations |QO 5o> etc., and taking as ordinates the 
probabilities recorded in columns 3 of table. 

If we had taken all the intermediate integers from |S8 to 1% we should have 
had ten times as many points which would arrange themselves on the curve in Fig. 
2. By increasing the number of coins and decreasing the horizontal scale we can 
get the plotted points as close together as we please. The curve so obtained is 
known as the normal probability curve. The curve in Fig. 2 is a close approxi- 
mation. Clearly, the probabilities can be converted into frequencies by multiply- 
ing each of them by the same large number, and then we obtain the normal 
frequency curve identical with the probability curve by merely adjusting the 
scale. 

The causes of deviations in the case of biological measurements are an- 
alogous to the causes which produce deviations in the tossing of pennies, and it 
has, furthermore, been found by experience that the frequency curves of many 
populations obtained in biology follow the normal probability curve. While 
more will be said later about distributions which are not normal, for the pres- 
ent, let us assume that we are dealing with normal distributions, and proceed to 
justify the standard deviation as a measure of variability. 

Geometrical Meaning of Standard Deviations. It should be noted that 
there are two points, A and B, Fig. 2, on the normal frequency curve such that, 
as we follow the course of the curve from left to right, the curve changes at A 
from concave upward to concave downward, and it changes at B from concave 
downward to concave upward. Such points on a curve are called points of in- 
flexion. The important fact is that y z the distance between these two points is 
the standard deviation* of the population represented by the frequency curve, 
and that this distance determines the curve in a manner analogous to the way 
in which the radius determines a circle. For this reason, we can say that the 
standard deviation is a perfect measure of variability for a norrfial distribution. 
For, when it is given along with the type, we can draw the curve which is com- 
pletely descriptive of variability. In other words, the form of the population 
can be reproduced. This completely justifies the use of standard deviation as a 
measure of variability for a normal distribution. 

When the distribution is not normal, the standard deviation can at most be 
considered as only approximately descriptive, but it is always a significant meas- 
ure of variability. 

If along the base line of the probability curve (Fig. 3) we measure distances 
in terms of standard deviation so that when x is any horizontal distance OP in 
ordinary units, and a the standard deviation in the same units, we can present 



'This is proved by methods involving the calculus. 
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the following useful table of areas which correspond to ^ For a value of 
x = OP the area concerned is bounded by the base line, the probability curve, 
OY, and the line through P parallel to OY. These areas are given in Table I 
for various values of ^ and with such a unit of area that the total area under 
the curve is unity. 

TABLE 1. AREAS CORRESPONDING TO 



X 

<T 


Area 


X 

a 


Area 


X 

a 


Area 


X 

<7 


Area 


0.00 


0.0000 


0.30 


0.1179 


0.60 


0.2257 


0.90 


0.3159 


0.01 


0.0040 


0.31 


0.1217 


0.61 


0.2291 


0.91 


0.3186 


0.02 


0.0080 


0.32 


0.1255 


0.62 


0.2324 


0.92 


0.3212 


0.03 


0.0120 


0.33 


0.1293 


0.63 


0.2357 


0.93 


0.3238 


0.04 


0.0160 


0.34 


0.1331 


0.64 


0.2389 


0.94 


0.3264 


0.05 


0.0199 


0.35 


0.1368 


0.65 


0.2422 


0.95 


0.3289 


0.06 


0.0239 


0.36 


0.1406 


0.66 


0.2424 


0.96 


0.3315 


0.07 


0.0279 


0.37 


0.1443 


0.67 


0.24857* 


0.97 


0.3340 


0.08 


0.0319 


0.38 


0.1480 


0.68 


0.25175 


0.98 


0.3365 


0.09 


0.0359 


0.39 


0.1517 


0.69 


0.2549 


0.99 


0.3389 


0.10 


0.0398 


0.40 


0.1554 


0.70 


0.2580 


1.00 


0.3413 


0.11 


0.0438 


0.41 


0.1591 


0.71 


0.2612 


1.10 


0.3643 


0.12 


0.0478 


0.42 


0.1628 


0.72 


0.2642 


1.20 


0.3849 


0.13 


0.0517 


0.43 


0.1664 


0.73 


0.2673 


1.30 


0.4032 


0.14 


0.0557 


0.44 


0.1700 


0.74 


0.2704 


1.40 


0.4192 


0.15 


0.0596 


0.45 


0.1736 


0.75 


0.2734 


1.50 


0.4332 


0.16 


0.0636 


0.46 


0.1772 


0.76 


0.2764 


1.60 


0.4452 


0.17 


0.0675 


0.47 


0.1808 


0.77 


0.2794 


1.70 


0.4554 


0.18 


0.0714 


0.48 


0.1844 


0.78 


0.2823 


1.80 


0.4641 


0.19 


0.0753 


0.49 


0.1879 


0.79 


0.2852 


1.90 


0.4713 


0.20 


0.0793 


0.50 


0.1^15 


0.80 


0.2881 


2.00 


0.4772 


0.21 


0.0832 


0.51 


0.1950 


0.81 


0.2910 


2.10 


0.4821 


0.22 


0.0871 


0.52 


0.1985 


0.82 


0.2939 


2.20 


0.4861 


0.23 


0.0909 


0.53 


0.2019 


0.83 


0.2967 


2.30 


0.4883 


0.24 


0.0948 


0.54 


0.2054 


84 


0.2995 


2.40 


0.4918 


0.25 


0.0987 


0.55 


0.2088 


0.85 


0.3023 


2.50 


0.4938 


0.26 


0.1026 


0.56 


2123 


0.86 


0.3051 


2.60 


0.4953 


0.27 


0.1064 


0.57 


0.2157 


0.87 


0.3078 


2.70 


0.4965 


0.28 


0.1103 


0.58 


0.2190 


0.88 


0.3106 


2.80 


0.4974 


0.29 


0.1141 


0.59 


0.2224 


0.89 


0.3133 


2.90 


0.4981 














3.00 


0.4987 



*Extra figure in 5th decimal place because we wish to use the result to this number of 
places later. 



PROBABLE ERROR 

Scientific results which depend upon measurement, or which are derived 
from a random sample of a larger group must not be looked upon as ex- 
actly representative of the larger group even if the greatest care has been taken 
to eliminate sources of error. The accuracy of statistical results depends not 
only upon the nature of the data, and the accuracy of the measurements, but 
also upon the number of variates. Granted that statistical results are only ap- 
proximations, it is of first rate importance to have a criterion to indicate what 
degree of confidence is to be placed in such results. The so-called probable 
error is designed to serve this purpose. 
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Probable Error of Single Variate. The probable error* of a single variate 
may be defined as that deviation ( E) from the mean on either side within 
which just one-half of the population is contained. In other words, it is an even 
wager that a variate taken at random would have a deviation greater or less 
than the probable error. The conception of the probable error of a single variate 
is useful only for deriving the probable errors of other results ; for, it is im- 
possible to ascertain the probable error of a single variate until a population has 

*\Ve are using- the term "error" here in the sense of a deviation from the most probable 
value derived from a set of measurements, and we are using- the mean as this most probable 
value. 
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been treated. For instance, some one might go to an unknown island for the 
purpose of finding the height and variability as to height of the population. Sup- 
pose that he has no knowledge td start with in regard to the inhabitants, and 
simply secures his measurements from the first inhabitant he chances to see, and 
reports the result. There is no way known of finding what the probable error 
is, until a population has been measured. 

When the unit of area is selected as explained above the area under the 
normal frequency curve becomes very significant in discussing probable error. 
Remembering that the total area represents the population, if ST and S'T 1 (Fig. 3) 
are so drawn as to include between them just one-half the area and are equally 
distant from OY (the line through the mean), the distance OS or OS 1 represents 
the probable error. From Table i, 'OS= - when the area in the table is 0.25. 
But, referring to the table, we have by interpolation that ^ = 0.6745 corre- 
sponds to an area 0.25. 

That is, the probable error of a single variate is found by multiplying the 
standard deviation of the population by 0.6745. 

The approximate value of the probable error of a single variate may be 
found without calculation by simply marking the magnitude S 1 below which ^ 
the population lies, and S ab^>ve which % of the population lies. Then B "^ Sl 
is the probable error of a single variate. Applied to the population given on p 27 

we have that S = 7.88 
S 1 = 6_6S 

S S 1 = 7.88 6.65 = 1.23 
-^ = 0.61 

Probable Error of the Mean. Given the mean of n variates taken at ran- 
dom from a larger group, we set the problem of finding the probable error in the 
mean. 

Imagine that we continue selecting random samples of n variates from this 
group until we find a considerable number m of means from these different 

samples. Let M,, M. a , M m be these means. They will not all be equal 

to each other but will themselves constitute a population which can be repre- 
sented by a frequency curve. Such a frequency curve of means will, of course, 
be much steeper than the frequency curve of the original observations. The 
standard deviation of this population of means can be shown, by calculus meth- 
ods, to be equal, to the standard deviation of the single population divided by 
the square root of n. 

Now, we can apply to this population of means the same definition of prob- 
able error that we have applied to a single observation, and if EM represents the 

0.671 DO- 

probable error of the mean, EM = / 

[/ n 

Probable Error of Standard Deviation. If we have found the standard 
deviations of m different population each of n variates and obtain a lt ff,, . . . , er n . 
These constitute a population whose standard deviation can be shown to be 

<T 

, and just as in the case of the probable error of a single variate, the 
V2 n 

a 

probable error E ff of the standard deviation is 0.6745 

\' 2n 
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Probable Error of Coefficient of Variability. If we have determined C lt 

C 2 , , C m as the coefficients of variability of m different random samples, these 

constitute a population in which C,, C 2 , , C m take the place of the meas- 

urments in an ordinary population, and it % can be shown by calculus methods that 

0.6745C r / c \ 2 ~\V 2 

the probable error in this statistical constant is EC = ,/"n~ \ 1+2 I TOO ) 

0.6745 

This can be taken in the simpler form T,c = , ~- if C does not exceed 8 or 

10 per cent. 

The probable error E in any result is sometimes defined as such a deviation 
from the true value (either above or below) that it is an even wager that the 
result obtained deviates more or less than E from the true value. The expres- 
sion "true value" is difficult of accurate definition, but may be thought of as 
meaning what would be obtained if an infinite number could be included in the 
sample. As the "true value" is an unattainable ideal so far as any important 
statistical results are concerned, it seems better to use the expression most 
probable value for our purposes. 

To be sure, there is a class of problems concerning which one may well hold 
a different view as to the true value of a determination. In dealing with certain 
populations, we may take all the individuals of the group in question, so that in 
one sense we have not limited ourselves to a random sample. For instance, we 
could easily count the number of rows on each ear of a certain plot of corn. 
Then one may say the true value of the mean can be found so far as this plot 
is concerned. But this result cannot be applied with absolute accuracy to any 
larger group of similar individuals, and such a result is of little importance for 
our problems. It is, on the contrary, of first rate importance in statistical work 
to be able to consider the given sample as representative of a larger group to 
which the results may be applied with a certain degree' of accuracy. 

The significance of the probable error as a measure of accuracy can be 
further shown if we ask what odds should be given in a wager as to a deviation 
2 E, 3 E, .... These results can at once be obtained from Table i and are 
given on p. 15 of this bulletin. We may say, in short, that an error in a result 
is as likely as not to be as great as E but it is very unlikely to be much greater. 

Number Required to give Good Representative Sample Paucity of Data 
In making a determination of type, mean, and variability of any measurable 
quantities by selecting a representative random sample, an important question 
arises as to the number which should be included in the random sample. Nearly 
all our quantitative knowledge of nature depends upon measuring only a portion 
of a population. For instance, we say that the average height of male cau- 
casions is 68.5 inches. We know that not all the statures have been taken to 
get this result, but rather a comparatively small number. The question is : how 
many should be taken from the whole group of male caucasions to provide a 
result which can be applied with reasonable accuracy to the whole group. Sim- 
ilarly, in taking measurements on a population made up of ears of corn the 
same questions arise as to the number to be taken. Of course, we should say "the 
more the better" if theory and accuracy were the" only considerations. But in 
order to make the method practical, it is desirable to save labor by taking only 
enough measurement to insure a certain degree of accuracy. 

The number to be taken should depend, to a certain extent, upon the varia- 
bility of the material. If the material shows but little variability a smaller num- 
ber needs be taken than if the population were much more variable. 
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In the work contained in this bulletin, the probable errors are such as to 
indicate that the number of variates is large enough to insure satisfactory results. 
After such a determination has been made, it is not difficult in general, by con- 
siderations of the probable error, and the nature of the frequency distribution, 
to decide whether enough individuals have been taken for the purpose in view. 
We should, however, if possible, know something of the numbers to be taken 
before carrying through the work. In this connection, it is important to remem- 
ber that the probable error of an average varies inversely as the square root of 
the number of observations. We shall, furthermore, give a table (Table 2) of 
probable errors of the coefficient of variability. By experience, it is, in general, 
possible to estimate this coefficient in a case to within a few per cent. Then 
the table gives the probable errors for different numbers of variates. 

As to the special work on the characters in question in this bulletin, we can 
say that we have found by experience with many distributions, and by the use 
of the probable error that three or four hundred variates give results of value. 

TAKING THE MEASUREMENTS. 

a) Devices. For taking the length and circumference of ears, we have designed 
a simple caliper. This caliper is provided with two scales, the one of which reads 
the lengths and the other the circumference of ears. However, we suggest that any 
corn grower desiring- to take such measurements get a shoemaker's device for 
getting the length of shoes. This will serve to give lengths. For getting cir- 
cumference, a tape rule will serve the purpose if not a very extensive investiga- 
tion is to be made. Or, diameters may be measured and converted into circum- 
ferences by means of the following table : 

CORRESPONDING DIAMETERS AND CIRCUMFERENCES. 



Diam. 


Circum. 


Diam. 


Circum. 


Diam. 


Circum. 


Diam. 


Circum. 


1. 


3.14 


1.7 


5.34 


2.4 


7.54 


3.1 


9.74 


1.1 


3.46 


1.8 


5.65 


2.5 


7.85 


3.2 


10.05 


1.2 


3.77 


1.9 


5.97 


2.6 


8.17 


3.3 


10.37 


1.3 


4.08 


2.0 


6.28 


2.7 


8.48 


3.4 


10.68 


1.4 


4.40 


2.1 


6.60 


2.8 


8.80 


3.5 


11.00 


1.5 


4.71 


2.2 


6.91 


2.9 


9.11 






1.6 


5.03 


2.3 


7.23 


3.0 


9.42 







For taking weight of ears, we have used postal scales although any kind of 
scales which weigh accurately to the nearest ounce serves the purpose. 

b) Accuracy. In the first place, it seems desirable to define what we mean 
by length, circumference, and weight. Shall we mean by length the length of 
that portion of the ear which has grown to maturity or the length of the cob? 
Similarly, shall we mean by the circumference the greatest distance around the 
ear? Likewise, do we mean by weight that taken at husking time, or that taken 
at a later date? * . 

As a matter of fact, we mean by length the length of the cob, and by cir- 
cumference the circumference taken one-third the distance from the large end 
of the ear towards the small end, and by weight the weight taken shortly after 
husking time. These measurements may, of course, be taken at any time, but 
it was convenient for our investigations to take the measurements at this time. 
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In beginning this investigation, the measurement of length and circumfer- 
ence were taken to the nearest tenth inch. Great care must, however, be taken 
to get such a close measurement, and, in general, results derived from such 
masurements are no better than results derived from measurements taken with 
less apparent accuracy. In fact, our experiments have shown that length may 
well be taken to half inches, circumference to .three-tenths inches and weights to 
ounces. The closeness of measurements is closely connected with the question 
of grouping measurements into classes. 

c) Grouping into Classes. In forming the frequency distribution the meas- 
urements are grouped into classes as has been shown in this bulletin. There is 
no object in taking measurements with extreme accuracy and then grouping them 
into broad classss. In fact, the nature of the frequency distribution with a given 
grouping must help to settle the question of grouping, and this in turn the close- 
ness of the measurements. In short, measurements should be so grouped as to 
show the variability and at the same time to leave the frequency distribution 
fairly smooth. In the matter of grouping, there are two opposing tendencies 
grouping into too few classes to show variability, and grouping into too many 
classes to give a smooth distribution. In short, the law of distribution is hidden 
because of too much detail. 

We may lay it down as a general rule that the classes should be only just 
broad enough to make the distribution fairly smooth, that is, there should be no 
vacant classes except very near the extremes of the range, and a gradual in- 
crease from one extreme up to a maximum and then a gradual decrease to the 
other extreme, if there is only one maximum in the distribution as is, in general, 
the case with these populations. 

In respect to grouping into classes the characters treated in this bulletin, we 
have settled upon one-half inch classes for length of ears, three-tenths inch for 
circumference, one ounce for weight and even numbers for rows. This classifi- 
cation or grouping was decided upon after experimenting with classes taken at 
more frequent intervals. 

There is a further danger of error in grouping besides the narrowness and 
broadness of classes. For example, at first we measured ears to the nearest 
tenth inch in length, then suppose we had made quarter inch groupings as 
follows : 

4, 4.25, 4.50, 475, 5.00, 5.25, 5.50, 5.75, 6.00, etc. 

At 5.75 would be grouped all ears which measured 57 and 5.8 while at 5.00 
would be grouped those which measured 4.9, 5.0, and 5.1. In the long run, this 
would clearly result in placing more ears at 5.0 than at 5.25 other things being 
equal. If we should group measurements taken to the nearest tenth inch in 0.5 
inch or 0.3 inch classes, no such difficulty arises. Such a grouping as that into 
quarter-inch group would not greatly disturb the mean and variability, but would 
destroy the smoothness of the distribution. Again, if we measure to quarter 
inches, but group to half inches, some measurements fall on the division lines 
between classes. Then one-half a variate may be recorded in each of the classes 
between which the variate falls, or if we are dealing with large numbers one 
can alternately put such a variate into a class above, and below such a meas- 
urement. 

While many other questions may arise in taking the measurements of a 
certain character, this brief discussion covers the main difficulties in obtaining 
measurements for this bulletin. 



